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Abstract 

Skeletal muscle is a major component of the human body. Age-related loss of muscle mass and function contributes to 
some public health problems such as sarcopenia and osteoporosis. Skeletal muscle, mainly composed of appendicular lean 
mass (ALM), is a heritable trait. Copy number variation (CNV) is a common type of human genome variant which may play 
an important role in the etiology of many human diseases. In this study, we performed genome-wide association analyses of 
CNV for ALM in 2,286 Caucasian subjects. We then replicated the major findings in 1,627 Chinese subjects. Two CNVs, 
CNV1191 and CNV2580, were detected to be associated with ALM (p = 2.26x10~ 2 and 3.34 xlO" 3 , respectively). In the 
Chinese replication sample, the two CNVs achieved p-values of 3.26 xlO -2 and 0.107, respectively. CNV1 191 covers a gene, 
GTPase of the immunity-associated protein family (GIMAP1), which is important for skeletal muscle cell survival/death in 
humans. CNV2580 is located in the Serine hydrolase-like protein (SERHL) gene, which plays an important role in normal 
peroxisome function and skeletal muscle growth in response to mechanical stimuli. In summary, our study suggested two 
novel CNVs and the related genes that may contribute to variation in ALM. 
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Introduction 

Loss and function impairment of skeletal muscle, especially in 
the elderly, are related to a number of public health problems 
(such as sarcopenia, osteoporosis) and increased mortality [1,2]. 
Whole lean body mass (LBM) is composed of skeletal muscle 
(~60%), viscera, and some other connective tissues. Appendicular 
lean mass (ALM) is sum of skeletal muscle mass in arms and legs 
which is the primary portion of skeletal muscle involved in 
ambulation and physical activities. ALM is considered to be an 
ideal measure for skeletal muscle mass [3,4,5,6]. ALM can be 
measured accurately by dual energy X-ray absorptiometry (DXA). 

Skeletal muscle is under strong genetic control, with heritability 
estimates of 30-85% for muscle strength and 50-80% for muscle 
mass [7,8]. Genome wide association studies have identified a 
number of variants that may account for variation in ALM [9,10]. 
However, collectively, the identified loci/genes/variants only 
explain a small fraction of genetic variation in ALM, and the 
majority of the genetic determination remains to be revealed. 
Traditional association studies have focused on single nucleotide 
polymorphisms (SNPs). Studies on other types of genetic variants, 



which may account for the "missing" heritability, have been 
relatively rare. 

Recent studies have shown that copy number variation (CNV) 
plays an important role in human diseases, such as schizophrenia 
[11,12], Parkinson's disease [13], and autism [14]. CNV is a 
common type of genomic variability with the size of DNA 
fragments ranging from one kilobase to several megabases and 
presents at variable copy numbers in comparison with reference 
genome [15]. CNV may influence gene expression, phenotypic 
variation and adaptation by disrupting coding or altering gene 
dosage [16,17,18,19]. Furthermore, it may affect gene expression 
indirectly through position effects, predispose to deleterious 
genetic changes, or provide substrates for chromosome change 
in evolution [15,20,21,22]. A recent GWAS of CNVs in Chinese 
identified the gremlin 1 gene that was associated with LBM 
variation [23]. However, to date, no study has been performed to 
investigate whether CNVs contribute to ALM in other ethnic 
groups such as Caucasians. 

In this study, we performed a CNV-based GWAS to identify 
genetic loci influencing variation in ALM in 2,286 Caucasian 
subjects. Follow-up replication analyses were performed in a 
Chinese population consists of 1,627 subjects. 
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Table 1. Basic characteristics of the study subjects. 







Discovery Sample (Caucasian) 




Replication Sample (Chinese) 




Total 


Male 


Female 


Total 


Male 


Female 


No. of subjects 


2,286 


558 


1,728 


1,627 


802 


825 


Age 


51.37 (13.75) 


50.71 (16.05) 


51.59 (12.92) 


34.49 (13.24) 


31.43 (7.97) 


37.46 (13.77) 


Height (cm) 


164.81 (43.03) 


171.66 (70.64) 


162.61 (28.65) 


164.25 (8.16) 


170.27 (5.96) 


158.38 (5.22) 


Weight (kg) 


73.86 (42.67) 


83.23 (67.08) 


70.83 (30.33) 


65.72 (9.61) 


65.74 (9.64) 


54.63 (8.09) 


FBM (kg) 


19.34 (5.65) 


20.67 (9.30) 


20.92 (13.20) 


14.01 (2.54) 


11.86 (5.10) 


16.13 (4.90) 


ALM (kg) 


22.58 (2.68) 


29.92 (4.84) 


20.22 (3.54) 


19.79 (1.94) 


23.95 (3.20) 


15.74 (2.10) 



Note: The numbers within parentheses are standard deviation (SD). 
doi:1 0.1 371 /journal.pone.0089776.t001 



Materials and Methods 

Ethics Statement 

The study was approved by Institutional Review Boards of 
Creighton University, University of Missouri-Kansas City, Hunan 
Normal University of China and Xi'an Jiaotong University of 
China. Signed informed-consent documents were obtained from 
all study participants before they entered the study. 

Subjects 

The discovery sample consisted of 2,286 unrelated Caucasian 
subjects that were of European origin recruited in Midwestern US 



(Kansas City, Missouri and Omaha, Nebraska). The inclusion and 
exclusion criteria were described in our previous publications [24] . 

Replication sample is an independent Chinese sample contain- 
ing 1,627 unrelated subjects. All subjects were recruited from the 
cities of Xi'an and Changsha and their neighboring areas in 
China. 

Phenotyping 

Anthropometric measures and a structured questionnaire 
covering lifestyle, diet, family information, medical history, etc. 
were obtained for all the study subjects. ALM and fat body mass 
(FBM) were measured using a dual-energy X-ray absorptiometry 
scanner Hologic QDR 4500 W (Hologic Inc., Bedford, MA, 



Table 2. CNVs achieved a p value of 0.05 or less in the discovery and replication samples. 





CNV 


Chr 


Start 


End 


p-value 1 


p-value 2 


CNV2563 


22 


23,993,985 


24,248,712 


1.70x10~ 5 


0.810 


CNV11821 


11 


5,228,247 


5,230,232 


4.66 x10~ 5 


0.995 


CNV148 


1 


1 95,089,940 


195,168,372 


2.32x10~ 4 


0.864 


CNV160 


1 


213,560,092 


213,565,727 


7.34 x10~ 4 


0.614 


CNV1610 


10 


58,880,511 


58,880,997 


1.50x10~ 3 


0.698 


CNV2057 


15 


19,803,370 


20,089,386 


1.63 x10~ 3 


0.173 


CNV11449 


8 


36,194,697 


36,197,883 


2.09 x10~ 3 


0.731 


CNV2580 


22 


41,234,550 


41,276,824 


3.34x10 3 


0.107 


CNV575 


4 


34,455,420 


34,500,578 


3.71 x10~ 3 


0.485 


CNV2546 


22 


20,055,998 


20,175,294 


5.29x10~ 3 


0.647 


CNV2694 


23 


139,324,076 


1 39,328,860 


2.16x10~ 2 


0.963 


CNV1191 


7 


149,916,734 


149,932,502 


2.26x10 2 


3.26x10 2 


CNV770 


5 


1 76,44,656 


1 7,698,273 


2.61 x10~ 2 


0.810 


CNV1004 


6 


126,225,385 


1 26,228,469 


2.89x10~ 2 


0.659 


CNV200 


2 


24,460,486 


24,464,632 


2.97x10~ 2 


0.706 


CNV825 


5 


86,151,134 


86,154,902 


3.12x10~ 2 


0.463 


CNV2529 


21 


43,794,765 


43,797,240 


3.16x10~ 2 


0.542 


CNV2417 


19 


47,986,230 


48,149,894 


3.18x10~ 2 


0.695 


CNV1282 


8 


24,201,375 


24,207,01 1 


4.01 x10~ 2 


0.311 


CNV2430 


19 


56,834,427 


56,840,009 


4.66 x10~ 2 


0.721 



Notes: 

'in discovery samples. 
2 in replication samples. 
Chr: chromosome. 

doi:1 0.1 371 /journal.pone.0089776.t002 
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Table 3. Association of normal_deletion and normal_duplication with CNVs. 







pair 


fi 


SE 


p-value 


Combined p-value 


CNV2580 


Caucasian 


normal_deletion 


-5.53xl0~ 3 


71.82 


0.94 


0.99 




normal_duplication 


-4.16x10~ 2 


15.59 


7.71 x10~ 3 


6.84x10~ 3 


Chinese 


normal_deletion 


-9.27x10~ 3 


152.84 


0.95 






normal_duplication 


-4.40 x10~ 2 


27.23 


0.11 




CNV1191 


Caucasian 


norma l_deletion 


-1.20x10~ 2 


10.43 


0.25 


0.04 




normal_duplication 


-21.3x10~ 2 


72.84 


3.46 x10~ 3 


0.02 


Chinese 


norma l_deletion 


4.70 x10~ 2 


21.88 


0.03 






normal_duplication 


9.84xl0~ 3 


94.49 


0.92 





Notes: 

(], the standardized regression coefficient, was estimated in kilograms for ALM. 
SE: Standard error. 

doi:1 0.1 371 /journal.pone.0089776.t003 



USA), for the all study samples. ALM (kg) was calculated as the 
sum of lean soft tissue (nonfat, non-bone) mass in the arms and 
legs. Weight was measured in light indoor clothing, using a 
calibrated balance beam scale, and height was measured as 
without shoes using a calibrated stadiometer. 

Genotyping 

Genomic DNA was extracted from peripheral blood leukocytes 
using standard protocols. Genome-Wide Human SNP Array 6.0 
(AfFymetrix, Santa Clara, CA, USA), which includes 906,600 
SNPs and 940,000 copy number probes, was used to genotype 
each subject from the discovery sample, according to the 
AfFymetrix protocol. Briefly, approximately 250 ng of genomic 
DNA was digested with restriction enzyme Nspl or Styl. Digested 
DNA was adaptor-ligated and PCR-amplified for each sample. 
Fragment PCR products were then labeled with biotin, denatured, 
and hybridized to the arrays. Arrays were then washed and stained 
using Phycoerythrin on AfFymetrix Fluidics Station, and scanned 
using the GeneChip Scanner 3000 7 G to quantitate fluorescence 
intensities. Data management and analyses were conducted using 
the Genotyping Command Console Software. For sample quality 
control (QC), a contrast QC threshold was set at a default value of 
greater than 0.4. The final average contrast QC across the entire 
sample reached a high level of 2.76 For our Caucasian cohort and 
2.62 For our Chinese cohort. 



Copy Number Analysis 

Common CNVs were identified using the CANARY algorithm 
implemented in the Birdsuite software [25], which utilized a 
previously defined copy number polymorphism (CNP, namely 
CNV with frequency greater than 1%) map based on HapMap 
samples [26]. In total, 1,216 CNPs were genotyped for the subjects 
of the discovery sample and 1280 CNPs in the replication sample, 
respectively. 

QC 

We conducted QC filtering both at the sample level and the 
CNV level, according to the previously reported methods [27]. 

First, For the sample level QC, we used three quality metrics 
reported by the Birdseye method to evaluate the initial 2,286 
subjects for quality in copy number genotyping. The following 
procedures were adopted: 1) we removed any sample that was 
greater or less than three standard deviations (SD) from the 
average estimate of copy number, which was approximate two 
copies at genome-wide level; 2) we calculated the variability in 
copy number and SNP probe intensities with each standardized 
per chromosome. We removed any sample with more than three 
SD than these estimates on the average genome-wide level; 3) we 
removed any sample in which more than two chromosomes Failed 
any of these three metrics, i.e. more than three SD in estimated 
copy number or excessive CNV or SNP variability for chromo- 




C'N measurement CN measurement 



Figure 1. ALM in groups with different copy number (CN) of CNV1191 in the discovery and replication samples. 

doi:1 0.1 371 /journal.pone.0089776.g001 
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Figure 2. ALM in groups with different CN of CNV2580 in the discovery and replication samples. 

doi:1 0.1 371 /journal.pone.0089776.g002 



some. According to above criteria, 7 1 subjects were discarded. The 
copy numbers of the remaining 2,215 subjects were successfully 
genotyped using the CANARY software. 

Second, we conducted QC filtering at the CNV level. Out of the 
initially called CNVs, we excluded those with uncertain or missing 
copy call of >5% or with a minor variant frequency of < 1 %. We 
discarded the CNVs with allele frequency of < 1 % . With the above 
QC criteria, a total of 410 CNVs remained in the subsequent 
analyses for the Caucasian sample. 

Statistical Analyses 

Association analyses of CNV were performed using a linear 
regression model in R package "glm" [28]. For both the initial 
GWAS and subsequent replication studies, stepwise regression was 
performed to screen the effects of covariates on ALM variation. 
Age, sex, height, and FBM were significant effectors (/><0.05) and 
raw ALM values were adjusted for these factors. We adjusted for 
covariates by a 2-stage procedure where the outcomes were 
regressed on covariates only, and then the resulting residuals were 
regressed on CNVs. To correct for the effect of potential 
population stratification, we conducted a principal component 
analysis on genome-wide SNP data with EIGENSTRAT [29] and 
included the top ten principal components as covariates. Fisher's 
method [30] was used to combine the /i-values from the discovery 
sample and replication sample. 



Results 

The basic characteristics of the subjects used in both discovery 
and replication samples are summarized in Table 1. 

In the discovery sample, 20 CNVs showed evidence for 
association with ALM at a p value of 0.05 (Table 2). CNV1191 
and CNV2580 were replicated in the Chinese sample. The p 
values of CNV 1191 in the discovery and replication samples were 
2.26 xlO -2 and 3.26 xlO -2 , respectively, and p values of 
CNV2580 in the discovery and replication samples were 
3.34x10 J and 0.107, respectively (Table 2). The combined p 
values of the two CNVs were 6.05 xlO -3 and 3.27 xlO" 3 , 
respectively. 

We further tested association between normal (CN = 2) and 
deletion (CN = 0, 1) groups, and between normal and duplication 
(CN = 3, 4) groups, separately. The results showed that while the 
direction of effect of CNV2580 was consistent in discovery and 
replication samples, it was not the case for CNV1191 (Table 3). 
However, both CNVs remained to be significant in the combined 
analyses. 

In addition to the 2-step adjustment procedure for covariates 
aforementioned, we performed association analyses where CNVs 
and covariates were included in a single model. The results were 
quite similar to those of the 2-step procedure (Table SI). 

According to the UCSC Genome Browser on Human February 
2009 (GRCh37/hgl9) Assembly, CNV1191 is located at the 
chromosome region 7q36.1 with physical position ranging from 
149,916,734 bp to 149,932,502 bp, within the gene GTPase 



Table 4. The proportion of the subjects in each CN category of CNV2580. 



Caucasian Chinese 



Theoretical Actual Theoretical Actual 



CN = 0 


1.16x1CT 5 


0 


4.72 x10~ 5 


0 


CN = 1 


4.00 X1CT 3 


5.00x10 3 


3.00x10 3 


3.00 x10~ 3 


CN = 2 


0.76 


0.77 


0.76 


0.77 


CN = 3 


0.22 


0.20 


0.22 


0.19 


CN = 4 


0.02 


0.02 


0.02 


0.03 


GOF 


0.22 




0.22 





Note: 

GOF: Goodness-of-fit. 

doi:1 0.1 371 /journal.pone.0089776.t004 
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IMAP family member 1 (GIMAP1). The number of carriers with 
CN = 0, 1, 2, 3 and 4 was 126, 855, 1273, 28 and 4, respectively, 
in the discovery sample. Due to the limited number of subjects 
with CN = 4, we merged CN = 3 and CN = 4 into a single group. 
The number of carriers with CN = 0, 1, 2, 3 was 42, 465, 1093 and 
27, respectively, in the replication sample. In the discovery sample, 
carriers with CN= 1 and CN = 2 had higher ALM (22.7 kg and 
22.6 kg) and carriers with CN = 3 had the lowest ALM (21.1 kg) 
(Figure 1). Consistently, in the replication sample, carriers with 
CN= 1 and CN = 2 had higher ALM (19.8 kg) and carriers with 
CN = 3 had the lowest ALM (19.3 kg) (Figure 1). 

CNV2580 is located at the chromosome region 22ql 3.2 with 
physical position ranging from 41,234,550 bp to 41,276,824 bp, 
within the gene serine hydrolase-like protein (SERHL). The 
number of carriers with CN = 1, 2, 3, and 4 was 11, 1763, 455, 
and 57, respectively, in the discovery sample, and was 5, 1257, 314 
and 50, respectively, in the replication sample. Due to the limited 
number of subjects with CN = 1, we merged CN = 1 and CN = 2 
into a single group. In the discovery sample, carriers with CN = 2 
and CN = 3 had higher ALM (22.6 kg and 22.7 kg, respectively) 
and carrier with CN = 4 had the lowest ALM (21.7 kg) (Figure 2), 
with the estimated ft to be — 5.24 x 10~ 2 (ALM in kg) for each copy 
number. Consistently, in the replication sample, carriers with 
CN = 2 and CN = 3 had ALM of 19.8 kg andl9.9 kg, respectively, 
and carrier with CN = 4 had ALM of 18.5 kg (Figure 2), with the 
estimated fi to be -4.34xl0~ 2 (ALM in kg) for each copy 
number. 

Table 4 lists the proportion of subjects for each copy of 
CNV2580. The table also includes theoretical proportion calcu- 
lated based on empirical CN frequencies and random mating 
assumption. Goodness-of-fit (GOF) test showed that empirical 
distribution did not deviate from the theoretical distribution 
{p — 0.22 for both populations). 

There are two SNPs that are located in the region of CNV 1 191 
and eight SNPs outside the CNV1191 boundaries but inside the 
gene of GIMAP1. None of these ten SNPs was significandy 
associated with ALM in the discovery sample, but rsl 1 769150 was 
associated with ALM in the replication sample with ^-value of 0.02 
(Table 5). 

There are four SNPs that are located in the region of CNV2580 
and fifteen SNPs outside the CNV2580 boundaries but inside the 
gene of SERHL. None of these nineteen SNPs was significandy 
associated with ALM in the discovery sample, but two SNPs 
rs 1391 16 and rsl 39 120 were associated with ALM in the 
replication sample with /j-values of 0.02 (Table 5). 

Discussion 

This is the first CNV-based GWAS for ALM in Caucasians. 
Two CNVs, CNV1191 and CNV2580, were identified to be 
associated with ALM. 

CNV1191 is located in the gene GIMAP1, which encodes 
GTPase, IMAP family member 1. GIMAP (GTPase of the 
immunity-associated protein family) proteins are a family of 
putative GTPases believed to be regulators of cell death in 
lymphomyeloid cells. GIMAP 1 was the first reported member of 
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this gene family [31]. This gene was involved in the differentiation 
of T helper (Th) cells of the Th 1 lineage, and the related mouse 
gene has been shown to be critical for the development of the 
mature B and T lymphocytes [32]. 

Culturing myotubes from skeletal muscle-biopsies found coor- 
dinated reduced expression of five members of the GIMAP family 
GIMAP1, GIMAP4, GIMAP5, GIMAP6 and GIMAP 7, which 
form a cluster on chromosome 7 and participate in SM cell 
survival/death [33]. A study in pig skeletal muscle indicated that 
GIMAP 1 was correlated with meat quality and regulation of 
biological processes involved in the induction of apoptosis [34]. 
This gene was also involved in regulation of lipid catabolic process, 
defense response and positive regulation of calcium ion transport 
[35]. Our findings, combined with the above evidence, support the 
potential contribution of GIMAP 1 to variation in skeletal muscle. 

SERHL is a gene coding for a new member of the family of 
serine hydrolases that is located within peroxisomes [36]. In vivo 
studies showed that mRNA expression of SERHL increased in 
response to passive stretch imposed upon skeletal muscle [36]. 

The association directions of CNV1191 in the discovery and 
replication studies were different. This inconsistency may be 
explained by the following reasons. First, genetic variants may 
have different effects in different populations. A genetic variant 
may have different allele frequencies among diverse populations 
because of different evolution histories, which result in different 
modes of genotype-phenotype association [37]. Second, significant 
associations are usually found at molecular markers that are in 
linkage disequilibrium (LD) with causal variant, rather than the 
causal variant itself. Therefore, the inconsistency in direction could 
be a result of opposite patterns of LD between the two populations. 

Within the two CNVs regions, we did not identify any 
significant SNPs that were associated with ALM in the discovery 
sample. A possible explanation is that, different from SNP, CNV is 
a structural genetic variant that generally covers a larger genomic 
region and thus CNV may influence phenotypic variation by 
mechanisms that are different from SNP. 

In summary, we identified CNV1 191 and CNV2580 that were 
associated with ALM. The relevant genes, GIMAP 1 and SERHL, 
may play roles in skeletal muscle metabolism. Our findings may 
provide useful information for molecular functional studies of 
candidate genes for ALM. 
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